Introduction
This final project aims to highlight some of the techniques acquired during the SDS course, in particular regarding the construction and simulation of non-linear regression and classification models using tools such as Rjags and MCMC. In this regard we will consider two models for nonlinear regression and two models for binary classification of the state of stars: primordial or advanced.
The dataset is taken from Kaggle here and contains data according to 6 different types of stars classified accordingly to some features like:
- Absolute Temperature (in K)
- Relative Luminosity (L/Lo)
- Relative Radius (R/Ro)
- Absolute Magnitude (Mv)
- Star Color (white,Red,Blue,Yellow,yellow-orange etc)
- Spectral Class (O,B,A,F,G,K,M)
- Star Type(0: Red Dwarf, 1: Brown Dwarf, 2: White Dwarf, 3: Main Sequence , 4: SuperGiants, 5: HyperGiants)
The Hertzsprung–Russell diagram
The Hertzsprung–Russell diagram, is a scatter plot of stars showing the relationship between the stars’ absolute magnitudes or luminosities versus their stellar classifications or effective temperatures. It turned out to be of considerable importance in the understanding of the analyzed data as it allowed a fast and usable visualization of the classes of stars and their reference parameters. In fact, after a quick analysis I made a plot of the correlation to have an immediate figure of the correlation and the relationship between the input variables. From this type of visualization, non numerical variables, have been excluded.
Correlation Plot
ggcorrplot(corr, method = "circle")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

As we can see in the correlation plot between the numerical input variables, temperature, brightness, and radius are positively correlated, as stars in these six classes are celestial bodies that steadily increase in size even in their final life stage. Except for absolute magnitude which is negatively correlated with the other variables. In fact, in the final phase of life of a star, although it has a significantly greater radius than all other life phases, its brightness is almost zero. Otherwise, in the case of the radius we can see a weak correlation with the Temperature, since high mass stars tend not to have high temperatures in the last two life phases.
Features Distribution
grid.arrange(plt.dens.temp,plt.dens.lum,plt.dens.radius,plt.dens.magn,ncol=2,nrow=2)

Abs.Magnitude vs Temperature Stars Clustering
Below I report a plots that define the clusters of the various stars as a function of absolute magnitude.
plt1.6class

Main Sequence Data Restriction:
As shown in the clustering plot the main sequence is the star’s lifetime that i have restricted my analysis, in facts, the cores of main-sequence stars are in hydrostatic equilibrium, where outward thermal pressure from the hot core is balanced by the inward pressure of gravitational collapse from the overlying layers. The strong dependence of the rate of energy generation on temperature and pressure helps to sustain this balance. Thanks to this characteristic balance, we can observe how the trend of the variables generates a functional trend similar to a decreasing function that from now on i will try to investigate and find two models that allow me to fit the data and predict new observations.
grid.arrange(plt.dens.temp.res,plt.dens.lum.res,plt.dens.radius.res,plt.dens.magn.res,ncol=2,nrow=2)

Model I
In the first model, the decreasing trend of the data is described as nonlinear model distributed as a normal with mean value a descending exponential function. A function always positive as the exponential was not enough because the magnitude also reaches negative values, so I introduced a another (root) function. In fact, the combination of the two functions allows to observe the following trend: in the early stages is the exponential function to have the predominance while later, with the cancellation of the effect of the exponential comes out the root function that completes the desired trend.
We assume that each observation Y_i is drawn from a Normal distribution with mean \(\mu_i\) and precision \(\tau = \frac{1}{\sigma^2}\)
Likelihood function: \[\begin{equation*}
Y_i \sim \mathcal{N(\mu,\tau^2)}
\end{equation*}\]
\[\begin{equation*}
\mu_i = f(x_i)= \alpha(x_i - k)^{\beta} -\gamma \exp^{\delta(x_i -k)}
\end{equation*}\] i= 1,…,40
\[\begin{align*}
L(\alpha,\beta,\gamma,\delta,k,\tau|y,x) = \prod_{i=1}^n \frac{1}{\Big(2\pi \tau^2\Big)^{\frac{1}{2}}} exp\bigg\{-\frac{(y_i-\mu_i)^2}{2\tau^2} \bigg\} =
\end{align*}\]
\[\begin{align*}
= \frac{1}{\Big(2\pi \tau^2\Big)^{\frac{n}{2}}} exp\bigg\{-\frac{\sum_{i=1}^{n}(y_i-\mu_i)^2}{2\tau^2} \bigg\}=
\end{align*}\]
\[\begin{align*}
= \frac{1}{\Big(2\pi \tau^2\Big)^{\frac{n}{2}}} exp\bigg\{-\frac{\sum_{i=1}^{n}\bigg(y_i-\big[\alpha(x_i - k)^{\beta} -\gamma \exp{\delta(x_i -k)^2}\big]\bigg)^2}{2\tau^2} \bigg\}
\end{align*}\]
A choise for the prior of the precision \(\tau\) is the Gamma distribution with parameter \(\Gamma(a,b)\) with \(a,b \in \mathbb{R}_0^+\)
so the prior can be written as:
\[\begin{align*}
\pi = \frac{b^a}{\Gamma(a)} \tau^{a-1}e^{-b\tau}
\end{align*}\]
model
{
for( i in 1:N ) {
Y[i] ~ dnorm(mu[i], tau)
mu[i] <- alpha*pow((x[i]-k),beta) + gamma*exp(delta*(x[i]-k))
}
alpha ~ dnorm(1, 1.0E-1)
beta ~ dbeta(1.0, 1.0)
gamma ~ dnorm(0, 1.0E-1)
delta ~ dnorm(0, 1.0E-1)
tau ~ dgamma(0.001, 0.001)
k ~ dunif(0,min(x))
sigma <- 1 / sqrt(tau)
}
Syntetic Data Generation
Having defined the model, I created some synthetic data to study how well the model is able to retrieve the parameters and then you can see how by a good margin it succeeds.
################ SYNTETIC DATA GENERATION FOR EXPONENTIAL MODEL
sintetic_data_exp <- function(alpha,beta,gamma,delta,k,n_sample,noise_factor,left_bound,right_bound){
tau <- rep(0, n_sample)
x <- rep(0, n_sample)
shift <- rep(0, n_sample)
mu <- rep(0, n_sample)
y <- rep(0, n_sample)
y_noise <- rep(0, n_sample)
for (i in 1:n_sample){
x[i] <- runif(1,min=left_bound, max=right_bound)
tau[i] <- rgamma(1,0.001, 0.001)
shift <- x-k
mu[i] <- alpha*((x[i]-k)^beta) + gamma*exp(delta*(x[i]-k))
y[i] <- rnorm(1,mu[i], tau[i])
}
y_noise <- jitter(y, factor=noise_factor, amount = NULL)
return(as.data.frame(list("Temperature"=sort(x),"Magnitude"=sort(y),"Magnitude_w_noise"=sort(y_noise))))
}
sn_data_exp = sintetic_data_exp(alpha=2,beta=.5,gamma=2,delta=-50,k=.5,n_sample=400,noise_factor=0,left_bound=0.5,right_bound=4.6)
Testing Model I on Synthetic Data
We now turn to the actual data and see how the model works. The evaluation and comparison of the two models will be done using DIC.
The model with the lower DIC indicate a better model-data fit.
syn_data_j_exp <- list("Y"=sn_data_exp$Magnitude,"x"=sn_data_exp$Temperature,"N"=40)
inits = list(list(alpha = 1.2, beta = 0.2, gamma = 0.2,delta=-50, tau = 1),
list(alpha = 1.2, beta = 0.2, gamma = 0.2,delta=-50, tau = 1))
params = c("alpha","beta","gamma","delta","k","tau")
model_exp_f=jags(data=syn_data_j_exp, inits=inits,
parameters.to.save=params,
model.file="models/exp_model.txt",
n.chains=2,
n.burnin=1000,
n.iter=10000,
n.thin = 10)
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 40
## Unobserved stochastic nodes: 6
## Total graph size: 1098
##
## Initializing model
model_exp_f
## Inference for Bugs model at "models/exp_model.txt", fit using jags,
## 2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 1800 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## alpha 3.354 1.848 1.421 1.758 2.780 4.681 7.495 1.032 53
## beta 0.196 0.210 0.009 0.018 0.036 0.360 0.693 1.037 48
## delta -35.610 41.674 -98.741 -83.984 -0.776 -0.386 -0.205 1.006 250
## gamma -3.555 1.883 -7.797 -4.623 -3.454 -2.420 -0.265 1.001 1800
## k 0.481 0.046 0.308 0.479 0.501 0.501 0.501 1.337 17
## tau 165.236 139.584 19.738 34.595 154.672 278.295 443.913 1.055 34
## deviance -71.348 43.051 -124.315 -112.835 -96.397 -29.168 -13.969 1.049 38
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 901.9 and DIC = 830.6
## DIC is an estimate of expected predictive error (lower deviance is better).
chainArray.exp.f <- model_exp_f$BUGSoutput$sims.array
Testing Model I on Real Data
real_data_exp <- list("Y" = df_stars_exp$Absolute.magnitude.Mv.,"x" = log(df_stars_exp$Temperature..K.,base=10), "N" =length(df_stars$Temperature..K.))
inits = list(list(alpha = 2, beta = .2,gamma = 0.2,delta=-1.5, tau = .1),
list(alpha = 2.5, beta = .2,gamma = 0.2,delta=-1.5, tau = .1))
params = c("alpha","beta","gamma","delta","k")
model_exp=jags(data=real_data_exp, inits=inits,
parameters.to.save=params,
model.file="models/exp_model.txt",
n.chains=2,
n.burnin=1000,
n.iter=10000,
n.thin = 10)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 40
## Unobserved stochastic nodes: 6
## Total graph size: 378
##
## Initializing model
model_exp
## Inference for Bugs model at "models/exp_model.txt", fit using jags,
## 2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 1800 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## alpha -5.940 0.800 -7.803 -6.259 -5.788 -5.436 -4.910 1.081 59
## beta 0.752 0.164 0.400 0.644 0.775 0.889 0.987 1.009 170
## delta -2.242 0.534 -3.264 -2.619 -2.256 -1.878 -1.215 1.027 84
## gamma 8.680 1.232 6.922 7.816 8.458 9.293 11.658 1.002 1100
## k 3.569 0.036 3.477 3.553 3.578 3.595 3.609 1.004 420
## deviance 93.936 3.213 89.899 91.472 93.236 95.602 101.782 1.018 110
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 5.1 and DIC = 99.1
## DIC is an estimate of expected predictive error (lower deviance is better).
chainArray.exp <- model_exp$BUGSoutput$sims.array
coda.fit.exp <- coda::as.mcmc(model_exp)
Density Plot Model I
Let’s have a look on the parameter distributions
bayesplot::mcmc_combo(coda.fit.exp)

Running Mean Plot Model I
The trend seen from the trace plot is even more evident if we consider the convergence of the running mean.
grid.arrange(plt.alpha,plt.beta,plt.gamma,plt.delta,plt.k)
## Warning: Removed 50 row(s) containing missing values (geom_path).
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 row(s) containing missing values (geom_path).

Autocorrelation Plot Model I
An important aspect to take into account in these problems is the autocorrelation between the points, in fact ideally we expect the autocorrelation to go to zero as this implies that the new points generated are independent and uncorrelated. In fact for this reason it has been set the value n.thin qual to 10, meaning that we are taking a point after every ten to avoid a possible correlation.
coda.fit <- coda::as.mcmc(model_exp)
coda::acfplot(coda.fit.exp)

Model Quantitative Diagnostic Model I
Raftery & Lewis Test for Model I
Raftery and Lewis’s diagnostic is a run length control diagnostic based on a criterion of accuracy of estimation of the quantile q. The algorithm looks for the smallest thinning interval k that makes Zt (binarization of the chain θ w.r.t. .θ) behaves as if it came from an independent 2-state Markov chain.
coda::raftery.diag(coda.fit)
## [[1]]
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s
##
## [[2]]
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s
The algorithm returns the number of sample size N required to estimate the quantile q within an accuracy r with probability s.
Geweke Test for Model I
Geweke is a convergence diagnostic test for Markov chains. This diagnostic is based on a test for equality of the means of the first and last part of a Markov chain (by default the first 10% and the last 50%). If the samples are drawn from a stationary distribution of the chain, then the two means are equal and Geweke’s statistic has an asymptotically standard normal distribution. The test statistic is a standard Z-score: the difference between the two sample means divided by its estimated standard error. The standard error is estimated from the spectral density at zero, and so takes into account any autocorrelation. The Z-score is calculated under the assumption that the two parts of the chain are asymptotically independent. As those plots show, the Z-score values are mostly in the acceptance region
coda::geweke.diag(coda.fit)
## [[1]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## alpha beta delta deviance gamma k
## -1.3868 -0.8031 1.1550 -0.2298 -2.1974 1.6494
##
##
## [[2]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## alpha beta delta deviance gamma k
## 0.29983 -0.32439 0.01066 -1.25797 -1.78932 1.63398
coda::geweke.plot(coda.fit)


Gelman Test For for Model I
The potential scale reduction factor is calculated for each variable in x, with upper and lower confidence limits. Approximate convergence is diagnosed when the upper limit is close to 1.
coda::gelman.diag(coda.fit)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## alpha 1.16 1.54
## beta 1.02 1.08
## delta 1.09 1.32
## deviance 1.04 1.18
## gamma 1.01 1.05
## k 1.01 1.03
##
## Multivariate psrf
##
## 1.07
Heidel Test for Model I
The Heidelberger and Welch diagnostic assesses the convergence by testing the null hypothesis (p-value 0.05) that the samples of the Markov chain are drawn from a stationary distribution. The “failure” of the stationarity test highlights that a longer MCMC run is needed. The half-width test calculates a 95% confidence interval for the mean, using the half of the width of this interval and then compares this value with the estimation of the mean.
coda::heidel.diag(coda.fit)
## [[1]]
##
## Stationarity start p-value
## test iteration
## alpha passed 1 0.433
## beta passed 1 0.477
## delta passed 1 0.244
## deviance passed 1 0.968
## gamma passed 1 0.436
## k passed 1 0.513
##
## Halfwidth Mean Halfwidth
## test
## alpha passed -5.835 0.1070
## beta passed 0.766 0.0238
## delta passed -2.301 0.0832
## deviance passed 93.626 0.2153
## gamma passed 8.718 0.2170
## k passed 3.567 0.0060
##
## [[2]]
##
## Stationarity start p-value
## test iteration
## alpha passed 1 0.492
## beta passed 1 0.287
## delta passed 1 0.368
## deviance passed 1 0.117
## gamma passed 1 0.688
## k passed 1 0.970
##
## Halfwidth Mean Halfwidth
## test
## alpha passed -6.045 0.20256
## beta passed 0.739 0.02143
## delta passed -2.184 0.09854
## deviance passed 94.245 0.33520
## gamma passed 8.643 0.16863
## k passed 3.571 0.00511
Credible Interval
chainArray <- model_exp$BUGSoutput$sims.matrix
colMeans(chainArray)
## alpha beta delta deviance gamma k
## -5.9398618 0.7523921 -2.2423108 93.9359244 8.6804553 3.5689866
Approximation Error
approx.error.exp.chain1
## alpha1 beta1 gamma1 delta1 k1
## 1 0.0004519604 2.914493e-05 0.001739852 0.0002725208 1.444814e-06
approx.error.exp.chain2
## alpha2 beta2 gamma2 delta2 k2
## 1 0.0009458643 3.036976e-05 0.001634321 0.0003550699 1.362142e-06
Point Estimate
cred <- 0.95
p.ET.jags <- apply(chainArray, 2, quantile,
prob=c((1-cred)/2, 1-(1-cred)/2))
p.ET.jags
## alpha beta delta deviance gamma k
## 2.5% -7.803246 0.4004705 -3.263854 89.8989 6.922236 3.476738
## 97.5% -4.909834 0.9866157 -1.214734 101.7822 11.658286 3.608697
HPD
p.HPD.jags <- coda::HPDinterval(as.mcmc(chainArray))
p.HPD.jags
## lower upper
## alpha -7.458420 -4.7885662
## beta 0.449416 0.9990595
## delta -3.248838 -1.2072610
## deviance 89.667320 100.7858486
## gamma 6.733745 11.2139157
## k 3.494732 3.6102937
## attr(,"Probability")
## [1] 0.95
Model II
model
{
for( i in 1:N ) {
Y[i] ~ dnorm(mu[i], tau)
mu[i] <- alpha + beta*x[i] + gamma*(x[i]^2)
}
alpha ~ dnorm(0.0, 1.0E-12)
beta ~ dnorm(0.0, 1.0E-12)
gamma ~ dnorm(0.0, 1.0E-12)
tau ~ dgamma(0.001, 0.001)
sigma <- 1 / sqrt(tau)
}
Testing Model II on Synthetic Data
################################ FAKE DATA GENERATION
sintetic_data <- function(alpha,beta,gamma,n_sample,noise_factor,left_bound,right_bound){
tau <- rgamma(n_sample,0.001, 0.001)
x <- rep(0, n_sample)
mu <- rep(0, n_sample)
y <- rep(0, n_sample)
y_noise <- rep(0, n_sample)
for (i in 1:n_sample){
x[i] <- runif(1,min=left_bound, max=right_bound)
mu[i] <- alpha + beta*(x[i]) + gamma*((x[i])^2)
y[i] <- rnorm(1,mu[i], tau)
}
y_noise <- jitter(y, factor=noise_factor, amount = NULL)
return(as.data.frame(list("Temperature"=x,"Magnitude"=y,"Magnitude_w_noise"=y_noise)))
}
sn_data = sintetic_data(alpha=-1,beta=-.8,gamma=.8,n_sample=40,noise_factor=800,left_bound=-2,right_bound=0.5)
################################ TESTING ON FAKE DATA AND RETRIVING PARAMETERS
syn_data_j <- list("Y"=sn_data$Magnitude_w_noise,"x"=sn_data$Temperature,"N"=40)
inits = list(list(alpha = 1.2, beta = 0.2, gamma = 0.2, tau = 1),
list(alpha = 1.2, beta = 0.2, gamma = 0.2, tau = 1))
params = c("alpha","beta","gamma","tau")
model_quad=jags(data=syn_data_j, inits=inits,
parameters.to.save=params,
model.file="models/quadratic_model.txt",
n.chains=2,
n.burnin=1000,
n.iter=10000,
n.thin = 10)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 40
## Unobserved stochastic nodes: 4
## Total graph size: 252
##
## Initializing model
model_quad
## Inference for Bugs model at "models/quadratic_model.txt", fit using jags,
## 2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 1800 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## alpha -1.018 0.026 -1.072 -1.035 -1.019 -1.001 -0.966 1.001 1800
## beta -0.853 0.061 -0.973 -0.894 -0.852 -0.811 -0.735 1.001 1800
## gamma 0.778 0.035 0.712 0.756 0.777 0.802 0.845 1.001 1800
## tau 113.673 27.674 66.546 94.315 111.427 130.226 174.514 1.002 1800
## deviance -74.905 3.424 -78.674 -77.118 -75.528 -73.413 -67.939 1.037 560
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 5.9 and DIC = -69.1
## DIC is an estimate of expected predictive error (lower deviance is better).
Testing Model II on Real Data
real_data_quad <- list("Y" = df_stars_quad$Absolute.magnitude.Mv.,"x" = log(df_stars_quad$Temperature..K.,base=10), "N" =length(df_stars$Temperature..K.))
inits = list(list(alpha = 1.2, beta = 1.2, gamma = 0.2, tau = 1),
list(alpha = 1.2, beta = 1.2, gamma = 0.2, tau = 1))
params = c("alpha", "beta", "gamma", "tau")
model_quad=jags(data=real_data_quad, inits=inits,
parameters.to.save=params,
model.file="models/quadratic_model.txt",
n.chains=2,
n.burnin=1000,
n.iter=10000,
n.thin = 10)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 40
## Unobserved stochastic nodes: 4
## Total graph size: 252
##
## Initializing model
model_quad
## Inference for Bugs model at "models/quadratic_model.txt", fit using jags,
## 2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 1800 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## alpha 218.652 26.122 165.694 201.607 218.960 236.229 269.680 1.001 1800
## beta -95.257 12.732 -119.898 -103.803 -95.466 -86.918 -69.358 1.001 1500
## gamma 10.157 1.545 7.007 9.149 10.181 11.191 13.133 1.001 1700
## tau 1.870 0.428 1.106 1.567 1.838 2.135 2.789 1.002 1500
## deviance 89.474 2.886 85.934 87.421 88.815 90.868 96.505 1.001 1800
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 4.2 and DIC = 93.6
## DIC is an estimate of expected predictive error (lower deviance is better).
chainArray.quad <- model_quad$BUGSoutput$sims.array
coda.fit.quad <- coda::as.mcmc(model_quad)
Density Plot Model II
bayesplot::mcmc_combo(coda.fit.quad)

Running Mean Plot Model II
grid.arrange(plt.alpha,plt.beta,plt.gamma)

Autocorrelation Plot Model II
coda::acfplot(coda.fit.quad)

Model II Quantitative Diagnostic
Raftery & Lewis Test for Model II
coda::raftery.diag(coda.fit.quad)
## [[1]]
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s
##
## [[2]]
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s
Geweke Test for Model II
coda::geweke.diag(coda.fit.quad)
## [[1]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## alpha beta deviance gamma tau
## -0.07315 0.05418 -1.23611 -0.04208 0.68600
##
##
## [[2]]
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## alpha beta deviance gamma tau
## -0.5765 0.5954 -1.1034 -0.6107 -1.1262
Gelman Test for Model II
coda::gelman.diag(coda.fit.quad)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## alpha 1 1.00
## beta 1 1.00
## deviance 1 1.00
## gamma 1 1.00
## tau 1 1.02
##
## Multivariate psrf
##
## 1.01
Heidel Test for Model II
coda::heidel.diag(coda.fit.quad)
## [[1]]
##
## Stationarity start p-value
## test iteration
## alpha passed 1 0.890
## beta passed 1 0.960
## deviance passed 1 0.213
## gamma passed 1 0.967
## tau passed 1 0.367
##
## Halfwidth Mean Halfwidth
## test
## alpha passed 219.31 1.6205
## beta passed -95.59 0.7888
## deviance passed 89.48 0.1941
## gamma passed 10.20 0.0957
## tau passed 1.86 0.0282
##
## [[2]]
##
## Stationarity start p-value
## test iteration
## alpha passed 1 0.840
## beta passed 1 0.823
## deviance passed 1 0.626
## gamma passed 1 0.806
## tau passed 1 0.673
##
## Halfwidth Mean Halfwidth
## test
## alpha passed 217.99 1.6864
## beta passed -94.93 0.8219
## deviance passed 89.47 0.1830
## gamma passed 10.12 0.0997
## tau passed 1.88 0.0303
Credible Interval Model II
chainArray.quad <- model_quad$BUGSoutput$sims.matrix
colMeans(chainArray.quad)
## alpha beta deviance gamma tau
## 218.651588 -95.256855 89.474228 10.157337 1.870342
Point Estimate Model II
cred <- 0.95
p.ET.jags <- apply(chainArray.quad, 2, quantile,
prob=c((1-cred)/2, 1-(1-cred)/2))
p.ET.jags
## alpha beta deviance gamma tau
## 2.5% 165.6943 -119.89850 85.93388 7.007062 1.106489
## 97.5% 269.6802 -69.35847 96.50507 13.133205 2.789119
HPD Model II
p.HPD.jags <- coda::HPDinterval(as.mcmc(chainArray.quad))
p.HPD.jags
## lower upper
## alpha 169.435812 272.344182
## beta -121.337211 -70.981509
## deviance 85.592845 94.828928
## gamma 7.271441 13.389959
## tau 1.072382 2.718793
## attr(,"Probability")
## [1] 0.95
Which regression model is best?
data.frame("Model" = c("Exponential", "Quadratic"),
"DIC" = c(model_exp$BUGSoutput$DIC,model_quad$BUGSoutput$DIC))
## Model DIC
## 1 Exponential 99.05368
## 2 Quadratic 93.64199
Models Comparison In Predictions
Finally, we can compare how the two models work for predictions. Considering some missing points, we can predict using the posterior predictive distribution, in the following way:
\[\begin{align*}
f(y^{pred}|y) = \int_\Theta f(y^{pred}|\theta,y,x) \pi(\theta) d\theta
\end{align*}\] where \(\Theta\) is the parameter space, \(y^{pred}\) are the predictions and \(x_i\) are the input features. From the two plots we can observe that the quadratic model works better (as the DIC had said), in fact the predictions agree better with the quadratic model than with the mixed exponential model fit.
grid.arrange(plt.c_e,plt.c_q,nrow=1)
## Warning: Removed 11 rows containing missing values (geom_point).
## Warning: Removed 29 rows containing missing values (geom_point).
## Warning: Removed 11 rows containing missing values (geom_point).
## Warning: Removed 29 rows containing missing values (geom_point).

Frequentist Approach vs Bayesian Approach Comparison
model_quad
## Inference for Bugs model at "models/quadratic_model.txt", fit using jags,
## 2 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 1800 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## alpha 218.652 26.122 165.694 201.607 218.960 236.229 269.680 1.001 1800
## beta -95.257 12.732 -119.898 -103.803 -95.466 -86.918 -69.358 1.001 1500
## gamma 10.157 1.545 7.007 9.149 10.181 11.191 13.133 1.001 1700
## tau 1.870 0.428 1.106 1.567 1.838 2.135 2.789 1.002 1500
## deviance 89.474 2.886 85.934 87.421 88.815 90.868 96.505 1.001 1800
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 4.2 and DIC = 93.6
## DIC is an estimate of expected predictive error (lower deviance is better).
x.1 <- log(df_stars_quad$Temperature..K.,base=10)
x.2 <- x.1^2
quadModel <- lm(Absolute.magnitude.Mv.~ x.1+x.2 , data=df_stars_quad)
summary(quadModel)
##
## Call:
## lm(formula = Absolute.magnitude.Mv. ~ x.1 + x.2, data = df_stars_quad)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.35813 -0.25115 -0.02477 0.60648 1.24166
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 219.855 26.291 8.362 4.73e-10 ***
## x.1 -95.850 12.816 -7.479 6.58e-09 ***
## x.2 10.230 1.555 6.578 1.04e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7317 on 37 degrees of freedom
## Multiple R-squared: 0.9611, Adjusted R-squared: 0.959
## F-statistic: 457.1 on 2 and 37 DF, p-value: < 2.2e-16
Classification: The Logit-Binomial Model
\[\begin{align*}
Y_i \sim Bern(logit(p_i))
\end{align*}\]
\[\begin{align*}
logit(p_i) = log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \beta_4x_4
\end{align*}\]
model{
# Likelihood
for (i in 1:N){
y[i] ~ dbern(p[i])
logit(p[i]) <- (beta0+beta1*(x1[i])+beta2*(x2[i])+beta3*(x3[i])+beta4*(x4[i]))
}
beta0 ~ dunif(min(x1),max(x1))
beta1 ~ dunif(min(x1),max(x1))
beta2 ~ dunif(min(x2),max(x2))
beta3 ~ dunif(min(x3),max(x3))
beta4 ~ dnorm(0.0, 1.0E+2)
}
Heidel Test Model III
coda::heidel.diag(coda.fit.logit)
## [[1]]
##
## Stationarity start p-value
## test iteration
## beta0 passed 1 0.2868
## beta1 passed 1 0.4661
## beta2 passed 1 0.0884
## beta3 passed 1 0.3318
## beta4 passed 1 0.2176
## deviance passed 1 0.8075
##
## Halfwidth Mean Halfwidth
## test
## beta0 passed 3.658 0.01461
## beta1 passed 3.493 0.00475
## beta2 passed 5.770 0.00885
## beta3 passed 1.451 0.03348
## beta4 passed -0.433 0.00411
## deviance passed 13.434 0.30534
##
## [[2]]
##
## Stationarity start p-value
## test iteration
## beta0 passed 1 0.8930
## beta1 passed 1 0.0839
## beta2 passed 91 0.2638
## beta3 passed 1 0.1630
## beta4 passed 181 0.2046
## deviance passed 1 0.4342
##
## Halfwidth Mean Halfwidth
## test
## beta0 passed 3.666 0.01452
## beta1 passed 3.492 0.00454
## beta2 passed 5.765 0.00989
## beta3 passed 1.480 0.03297
## beta4 passed -0.435 0.00462
## deviance passed 13.452 0.30641
Classification: The Probit-Binomial Model
model{
# Likelihood
for (i in 1:N){
y[i] ~ dbern(p[i])
probit(p[i]) <- (beta0+(beta1*x1[i]*x2[i])+(beta2*x3[i]*x4[i]))
}
beta0 ~ dnorm(0.0, 10)
beta1 ~ dnorm(0.0, 100)
beta2 ~ dnorm(0.0, 10)
Heidel Test Model IV
coda::heidel.diag(coda.probit.fit)
## [[1]]
##
## Stationarity start p-value
## test iteration
## beta0 passed 1 0.179
## beta1 passed 1 0.462
## beta2 passed 1 0.480
## deviance passed 1 0.473
##
## Halfwidth Mean Halfwidth
## test
## beta0 passed 0.408 0.02079
## beta1 passed 0.230 0.00587
## beta2 passed 0.167 0.01049
## deviance passed 2.048 0.14314
##
## [[2]]
##
## Stationarity start p-value
## test iteration
## beta0 passed 1 0.238
## beta1 passed 1 0.682
## beta2 passed 1 0.891
## deviance passed 1 0.322
##
## Halfwidth Mean Halfwidth
## test
## beta0 passed 0.418 0.02287
## beta1 passed 0.234 0.00564
## beta2 passed 0.169 0.01027
## deviance passed 1.962 0.15352
#Predictions
beta0 = 0.676
beta1 = 0.242
beta2 = 0.198
df_stars_test_life <- stars.df.sorted[stars.df.sorted$Star.type == '0',]
label="avanzato"
accuracy<- function(dataset,star.type,label) {
df_stars_test_life <- dataset[dataset$Star.type == star.type,]
x1 = log(df_stars_test_life$Temperature..K.,base=10)
x2 = log(df_stars_test_life$Luminosity.L.Lo.,base = 10)
x3 = log(df_stars_test_life$Radius.R.Ro.,base=10)
x4 = df_stars_test_life$Absolute.magnitude.Mv.
#######################
o <- beta0+beta1*(x1*x2)+beta2*(x3*x4)
o_perc <- 1/(1+exp(-o))
#######################
pred = c()
count_0 = 0
count_1 = 0
for (i in 1:length(o_perc)){
if (o_perc[i]<0.5) {
pred[i] = 0
count_0 = count_0+1
}
else {
pred[i]=1
count_1 = count_1+1
}}
if (label=="avanzato") {
acc = count_1/40
} else if (label == "primordiale" ) {
acc = count_0/40
}
return(acc)
}
primordiale.df <- data.frame(
"label.type" = c(0,1,2,3,4,5),
"acc.type" = c(1,1,1,0.05,0,0.55)
)
avanzato.df <- data.frame(
"label.type" = c(0,1,2,3,4,5),
"acc.type" = c(0,0,0,0.95,1,0.45)
)
acc.plt1 <- ggplot(primordiale.df, aes(x=label.type, y=acc.type, group=4))+ geom_line(linetype = "dashed")+geom_point(color="red")+
ggtitle("Classification Accuracy to Primordial Stars State") +
xlab("Star Type") + ylab("Accuracy")
acc.plt2 <- ggplot(avanzato.df, aes(x=label.type, y=acc.type, group=4))+ geom_line(linetype = "dashed")+geom_point(color="red")+
ggtitle("Classification Accuracy to Advanced Stars State") +
xlab("Star Type") + ylab("Accuracy")
accuracy(dataset=stars.df.sorted,star.type=c("5"),label="avanzato")
## [1] 0.45
grid.arrange(acc.plt1,acc.plt2, ncol=2)
